Handle RequestScoped instance injection gracefully for DefaultConfigurationStore#1758
Handle RequestScoped instance injection gracefully for DefaultConfigurationStore#1758HonahX merged 9 commits intoapache:mainfrom
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
If CDI scopes are a problem for tasks, it may be worth considering making a different, @ApplicaitonScoped Configuration Store impl. for them (and inject it only into tasks). WDYT?
There was a problem hiding this comment.
I'm not sure this is a robust impl. for getConfiguration... The exception can be anything. Could we make conditions and error handling more specific?
There was a problem hiding this comment.
IIUC the config store can be user provided, so we can't anticipate many specifics about the exception type.
Perhaps we should make this RuntimeException though, so:
- We don't swallow things like interrupts
- The user-provided config store has the ability to not get its exception swallowed if it so chooses
There was a problem hiding this comment.
The originally fix is not correct, we now fixes the back group task getConfiguration by directly passing the reaml into the function
|
@dimas-b After some more investigation, actually, i don't think CDI injection here is a robust way to handle configuration store. The TaskExecutor was failing with error like ContextNotActiveException, the reason seems that CDI injection doesn't work well when the access is made outside the scoped, like background task, which is what TaskExecutor does. Since when handle task is called, it does have a callContext passed, like this It does mean it need to get the configuration of a particular realm, instead of the default. I think the real robust way is just passing the callContext to the call its-self, instead of use CDI injection for this case. in other words, have call like following and we will remove WDYT? |
|
In the longer run, I'd prefer to reduce (and eventually remove) the use of In the short term, I'll try and take a stab at this PR trying to solve the config access sub-cases in a CDI-native way. Please give me a couple of days. |
There was a problem hiding this comment.
The whole refactoring to use the new interface is fairly large and touches persistent due to the loadTasks API, we will beak the whole refactoring to three parts:
- this PR that introduces the new API that only requires realm, and fixes the task executor to unblock the release
- a separate PR to switch other non-persistent part to use this new API
- a separate PR to switch the loadTasks to use the new API
There was a problem hiding this comment.
-1 on adding this interface method because is creates an API skew WRT other methods, which do not have the realm parameter.
There was a problem hiding this comment.
I'd like to see this happen, if all we need here is actually the realm. This largely simplify the interface, which also avoid any complication of CDI. Also all callers of getConfiguration() will have the realm ready to pass in.
A side note: why would a new method have to be the same with other methods for its parameters?
There was a problem hiding this comment.
@dimas-b i am actually planning to completely remove the usage of the other API that doesn't take realm, but I need to move towards it step by step due to the complexity. I can mark the other ones as deprecated to avoid new usages, WDYT?
There was a problem hiding this comment.
@dimas-b is the concern actually we don't pass the whole RealmContext? i can definitely pass the whole RealmContext to the function if that solves the concern
There was a problem hiding this comment.
The concern is having materially different inputs in the various lookup methods. Specifically, if one method has a "realm" parameter, all of them should have it.
I'm fine with either RealmContext or String as the type of the "realm" parameter (slightly prefer RealmContext).
There was a problem hiding this comment.
why would a new method have to be the same with other methods for its parameters?
Not all parameters, but those that affect lookup results, but are not always specified in method calls. In this case - realm ID (since configuration can be different in different realms). This is what I meant by "API skew".
There was a problem hiding this comment.
+1 on using RealContext to avoid hard-wiring.
There was a problem hiding this comment.
@dimas-b For the getConfiguration function, i think it make sense to have realmContext, instead of PolarisCallContext since the configuration is per Realm. So i think in short term, it make sense to deprecate the usage of the calls that uses PolarisCallContext, and start using the realmContext one, i will do that as soon as possible.
in long run, we can definitely leverage CDI once we figure out how to propagate request scoped context in the background thread correctly.
That is correct and that is precisely why the following methods were created:
It is also because of that that From that point onwards, it's 100% safe to use that instance of The instance returned by Of these 4 beans, 3 are application-scoped and should be usable in any thread: Taking a step back: One could argue here that we should rely on context propagation done by Quarkus. And yes, that was my first intention when I ported the You'll notice it's a But unfortunately, And my experimentations with Quarkus showed that in this case, the That was the reason why I switched to manual copies. |
|
@gh-yzou can you share the stack trace of your |
There was a problem hiding this comment.
Should we just get rid of this function then? I know it's used in a lot of places but it sounds unsafe to use.
There was a problem hiding this comment.
The whole refactor actually turns out pretty big, and touches persistent, so I am breaking this into couple of PRs. There will be follow up PRs coming to completely remove this function if we are not able to figure out how to propogate the beans :
PR to switch non loadTasks API to use the new API
PR to update the loadTasks to use the new API
This is also in the PR description.
There was a problem hiding this comment.
Isn't realm implied by polarisCallContext?
There was a problem hiding this comment.
No, you need CallContext to access RealmContext and the realm ID.
There was a problem hiding this comment.
I agree that the realm ID is not conveniently available from PolarisCallContext, but PolarisCallContext has a BasePersistence instance, which is specific to a realm (cf. JdbcBasePersistenceImpl)
There was a problem hiding this comment.
Indeed. The boundaries between CallContext and PolarisCallContext are unclear. PolarisCallContext could store the realm ID since an instance of that type cannot be constructed without knowing the realm ID.
There was a problem hiding this comment.
Agree, i am always getting confused about what should be in CallContext, and what should be in PolarisCallConext. Maybe we should follow up to make a clear definition of those contexts.
There was a problem hiding this comment.
My concern is more about potentially confusing realm ID is this new realm parameter and the implicit realm ID inside PolarisCallContext (even if it does not hold it as a field).
There was a problem hiding this comment.
i see, i updated the getConfiguration to take realmContext, instead of realm. and updated this function to just take the whole CallContext to be more clear.
|
@adutra Yes, we were trying to leverage CDI to get rid of the thread local usage CallContext.getCurrentContext() in DefaultConfigurationStore, and didn't realize that it doesn't work with task executor. We actually only need the realm, not even the whole call context. This PR currently introduces the new method, i will have follow up PRs to remove the other usages step by step, the whole refactor is pretty big and touches persistent, i need to move step by step to make thing simplier |
OK, in this case for consistency with other components, maybe we could pass public <T> @Nullable T getConfiguration(@Nonnull RealmContext realmCtx, String configName) {This could also answer @dimas-b concerns voiced here: |
LGTM as far as this PR is concerned. In the long run, I still believe it is preferable to have |
|
The |
84327ad to
e417244
Compare
| * @return the current value set for the configuration key for the given realm, or null if not set | ||
| * @param <T> the type of the configuration value | ||
| */ | ||
| default <T> @Nullable T getConfiguration(@Nonnull RealmContext realmContext, String configName) { |
There was a problem hiding this comment.
If we go with RealmContext, I believe it should be present in all methods.
As discussed in another thread, PolarisCallContext implies a particular realm, but does not expose the realm ID. Therefore, consistent behaviour is not guaranteed across methods using PolarisCallContext and RealmContext.
-1 This is a critical issue for this PR from my POV.
There was a problem hiding this comment.
To be clear: having convenience lookup methods based on CallContext would be fine, because we have CallContext.getRealmContext()
There was a problem hiding this comment.
Removing -1 in the interest of making progress. There is only one implementation, which appears to processes both lookup call paths correctly (as far as I can tell from the diff). I'd be fine merging in this state after the other minor comments are addressed.
@gh-yzou : Would you mind opening a GH issue for followup?
There was a problem hiding this comment.
The PolarisCallContext is actually not used anywhere during getConfiguration, and only realmContext is needed. So i think we should deprecate the usage of the methods that uses PolarisCallContext, and i plan to remove the usages step by step in follow up PRs, remove all of them in one PR makes the change really big and touches persistent. i can mark the other APIs as deprecated for now if that helps, would that work
There was a problem hiding this comment.
I also added a TODO to all other functions that we will update them to use RealmContext instead of PolarisCallContext
There was a problem hiding this comment.
I'm ok with improving config lookup with a series of PRs. I'd still appreciate a GH issue for visibility. I believe it would be a 1.0 blocker due to large lookup API impact.
| * @param configName the name of the configuration key to check | ||
| * @return the current value set for the configuration key or null if not set | ||
| * @param <T> the type of the configuration value | ||
| * <p>This function needs to be used with caution, it can not be called outside of active |
There was a problem hiding this comment.
Please move this warning above parameter descriptions.
service/common/src/main/java/org/apache/polaris/service/task/TableCleanupTaskHandler.java
Show resolved
Hide resolved
sfc-gh-jojiang
left a comment
There was a problem hiding this comment.
LGTM! Thanks @gh-yzou for working on this and everyone for reviewing!
* Fix regression test docker setup for purge (apache#1768) * Use canonical catalog property names in tests (apache#1766) * In `PolarisPolicyServiceIntegrationTest` * In `PolarisRestCatalogIntegrationTest` Following up to apache#1557 * Unblock test `createViewWithCustomMetadataLocation` (apache#1320) * Add test for invalid custom metadata location * Add missing properties during table/view creation * main: Update docker.io/prom/prometheus Docker tag to v3.4.1 (apache#1767) * Testing: silence a bunch of harmless test warnings (apache#1773) * `OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended` * Hibernate Validator cannot instrument static methods (`Hibernate Validator does not support constraints on static methods yet. ...`) * ForkJoinPool test lifecycle warning * Couple of split-package warnings * Add unit test for legacy config lookup (apache#1774) Following up on apache#1766 * Handle RequestScoped instance injection gracefully for DefaultConfigurationStore (apache#1758) Although we do a check of !realmContextInstance.isUnsatisfied() it only checks if there is matching bean, however, it doesn't check if the context is active or not. Similarly isResolvable also don't check if the scope is active. Therefore if getConfiguration is called when there is no active scope, it will get Exception like ContextNotActiveException. This actually fails the TaskExecutor because it runs in a separate thread. In order to fix the problem, we introduces a new getConfiguration function to handle the background tasks, and also use isResolvable instead of isUnsatisfied to handle ambiguities. * Restructure the directory and package name for persistence modules (apache#1724) * Re-add missing parameters to create_table python API (apache#1778) It looks like this method lost the `prefix` and `namespace` parameters in apache#1347. They're re-introduced to the spec here, and then I've run: ``` redocly bundle spec/polaris-catalog-service.yaml -o spec/generated/bundled-polaris-catalog-service.yaml ./gradlew regeneratePythonClient ``` Then, some manual reverts: ``` alias gitrevert='git checkout upstream/main --' gitrevert client/python/.github/workflows/python.yml gitrevert client/python/.gitlab-ci.yml gitrevert client/python/pyproject.toml ``` I still hope to automate this process as part of CI soon; see apache#1675 * Replace getConfiguration usage with PolarisCallContext to use RealmContext (PART 1) (apache#1780) * JDBC: Fix getting started config (apache#1781) Fix typo in the JDBC config for getting started, the config should be max_duration_in_ms instead of max_delay_in_ms * main: Pin dependencies (apache#1701) * main: Update dependency pytest to v8 (apache#1710) * main: Update dependency boto3 to v1.38.28 (apache#1777) * Run renovatebot only on the main branch (apache#1786) * INFO: last merged commit a827d26 --------- Co-authored-by: gh-yzou <167037035+gh-yzou@users.noreply.github.com> Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Liam Bao <liam.zw.bao@gmail.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Yufei Gu <yufei@apache.org> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com> Co-authored-by: Prashant Singh <35593236+singhpk234@users.noreply.github.com> Co-authored-by: JB Onofré <jbonofre@apache.org>
…rationStore (apache#1758) (apache#47) Although we do a check of !realmContextInstance.isUnsatisfied() it only checks if there is matching bean, however, it doesn't check if the context is active or not. Similarly isResolvable also don't check if the scope is active. Therefore if getConfiguration is called when there is no active scope, it will get Exception like ContextNotActiveException. This actually fails the TaskExecutor because it runs in a separate thread. In order to fix the problem, we introduces a new getConfiguration function to handle the background tasks, and also use isResolvable instead of isUnsatisfied to handle ambiguities. Co-authored-by: gh-yzou <167037035+gh-yzou@users.noreply.github.com>
Although we do a check of !realmContextInstance.isUnsatisfied() it only checks if there is matching bean, however, it doesn't check if the context is active or not. Similarly isResolvable also don't check if the scope is active. Therefore if getConfiguration is called when there is no active scope, it will get Exception like ContextNotActiveException.
This actually fails the TaskExecutor because it runs in a separate thread.
In order to fix the problem, we introduces a new getConfiguration function to handle the background tasks, and also use isResolvable instead of isUnsatisfied to handle ambiguities.
Tests:
or runs into prelivege issue when directly using rest call
follow up this in a separate branch here https://github.com/gh-yzou/polaris/pull/new/yzou-purge-spark-integration
Follow up tasks: